NSF PAR Search | NSF Public Access Repository

VISION: Robust and Interpretable Code Vulnerability Detection Leveraging Counterfactual Augmentation

https://doi.org/10.1609/aies.v8i1.36592

Egea, David; Halder, Barproda; Dutta, Sanghamitra (October 2025, Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society)

Automated detection of vulnerabilities in source code is anessential cybersecurity challenge, underpinning trust indigital systems and services. Graph Neural Networks (GNNs)have emerged as a promising approach as they can learn thestructural and logical code relationships in a data-drivenmanner. However, the performance of GNNs is severelylimited by training data imbalances and label noise. GNNscan often learn “spurious” correlations due to superficialcode similarities in the training data, leading todetectors that do not generalize well to unseen real-worlddata. In this work, we propose a new unified framework forrobust and interpretable vulnerability detection—that wecall VISION—to mitigate spurious correlations bysystematically augmenting a counterfactual trainingdataset. Counterfactuals are samples with minimal semanticmodifications that have opposite prediction labels. Ourcomplete framework includes: (i) generating effectivecounterfactuals by prompting a Large Language Model (LLM);(ii) targeted GNN model training on synthetically pairedcode examples with opposite labels; and (iii) graph-basedinterpretability to identify the truly crucial codestatements relevant for vulnerability predictions whileignoring the spurious ones. We find that our frameworkreduces spurious learning and enables more robust andgeneralizable vulnerability detection, as demonstrated byimprovements in overall accuracy (from 51.8% to 97.8%),pairwise contrast accuracy (from 4.5% to 95.8%), andworst-group accuracy increasing (from 0.7% to 85.5%) on thewidely popular Common Weakness Enumeration (CWE)-20vulnerability. We also demonstrate improvements using ourproposed metrics, namely, intra-class attribution variance,inter-class attribution distance, and node scoredependency. We provide a new benchmark for vulnerabilitydetection, CWE-20-CFA, comprising 27,556 samples fromfunctions affected by the high-impact and frequentlyoccurring CWE-20 vulnerability, including both real andcounterfactual examples. Furthermore, our approach enhancessocietal objectives of transparent and trustworthy AI-basedcybersecurity systems through interactive visualization forhuman-in-the-loop analysis.

Free, publicly-accessible full text available October 15, 2026

Search for: All records